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L INTRODUCTION 

The equivalence problem is to determine the finest partition 
on a set that is consistent with a sequence of assertions of the 
form "k =■ y", A strategy for doing this on a computer processes 
the assertions serially, maintaining always in storage a represen¬ 
tation of the partition defined by the assertions so far encoun¬ 
tered. To process the command r 'x = y'% the equivalence classes of 
k and j are determined. If they are the same* nothing further is 
done; otherwise the two classes are merged together. 

Galler and Fischer (19G4A) give an algorithm for solving this 
problem based on tree structures, and it also appears in Knuth 
{196S-A) . The items in each equivalence class are arranged in a 
tree, and each item except for the root Contains a pointer to its 
father. The root contains a flag indicating that it is a root, 
and it tuy also contain other information relevant £0 the equiva¬ 
lence class as a whole. 

Two operations are involved in processina a command Ir it s y 1 ': 
first we must find the classes contaInina * and y* and then these 
classes are (possibly) merged together. The find is accomplished 
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by succass-ively following the father links up the path from the 
given node until the root is encountered. To merge two trees 
together, the root of one is attached to the toot of the other, 
and the fotmer node ie marked to indicate that it is no longer a 
root in the new data structure. 

The time required to accomplish a find depends on the length 
of the path from the given node to the root oE its tree, while the 
time to process- a merge (given the roots of the two trees involved) 
is a constant. For definiteness, we let the cost of a merge be 
unity and the cost of a find he the numher of nodes, including the 
endpoints, on the path from the given node to the root. 

In this paper, we are interested in the way the cost of a 
sequence of instructions grows as a function of its length. Using 
the above algorithm, a sequence of n merge Instructions can causa 
a tree to be built with a node v of depth n, so subsequent finds 
on that node will cost iri-1 units each. The sequence consisting of 
the n merge instructions followed by n copies of a fiud(v) instruct 
Cion will then cost n(r_+2) , and it is easy to see that 0(n ! ) is an 
upper bound as well. 

The above example suggests adding to the algorithm a '’collap¬ 
sing rule" which Knuth (19?iA> attributes to 'fritter. Every time 
a find instruction is executed, a second pass is made yp the path 
from the given node to the root and each node on that path is 
attached directly to the root (except for the root Itself}. At 
worst this will only double the cost of the algorithm, and it may 
cause subsequent finds to be greatly speeded up. Indued this 
turns out to be the case, for in Section 4 we sh-nw that the upper 
3/2 

bound drops to 0(n } using this heuristic. 

Another heuristic, the ’’weighting rule", was studied by 
Ropcroft end Ullssn (1971A) and previously known to several others. 
"hTisn performing a merge, an attempt la made to keep the trees 
balanced by always attaching the tree with the smaller number of 
nodes to the root of the tree with the larger hurtfcer. To- do this 
efficiently requires that extra storage be associated With each 
root in which to record the n urear of nodes in its tree, Ropcroft 
and Uliman show that with the weighting, a tr*# of fl nodes can 
have height at moat log n, and it foil owe that an instruction 
sequence of length n>l can therefore have, cost no greater than 
0(n log n). Moreover, in the sbEcnCe of the collapsing rule, it 
ia eaay to conatruct inatruntion Bequences whose coat does grow as 
n log n. 

Combining the collapaing rule with the weighting rule yialda 
an algorithm superior to thoae uaing either heuristic alone. With 
only the collapsing rule, we exhibit in Section 3 sequences whoae 


cost grows proportionally to n log n, where n is the Length of the 
sequence, and as we remarked above, a similar lower bound holds 
for just Che weighting rule alone. Combining both heuristics, we 
derive in Section 4 an 0 (n log log n) upper bound, Hopcroft and 
tfllfcan (1971A) claim that the upper bound is actually linear. 
However, we have a counterexample to one of their earlier lencas, 
and although this difficulty can be overcome, we are unable to 
follow the finel pert o£ their argument. 

2, THE ALGORITHMS 

An l@n&2 program over the set E ia any aequence of 

instruetiona of the form find(a) where a ia an element of E, or 
mergeCA,fl,C) where A, B and C are, names of equivalence classes. 
(Gfr Hopcroft and Uliman (1971A).) Find(a) returns the name of 
the equivalence class of which a ia a member, and merge {A, E„C) 
combines classes A and B into a single nev class C. 

Vo now consider two algorithms which can be used to implement 
equivalence programs* He first need some notation, 

A foi'-i-st F is a set of oriented (unordered) trees over some 
set V(P) of nodes. If v is a node* then dSpfcJi[F)(v) is the length 
of the path in F from v to a root, and heigh t (F j (v) is the majiiEUE 
length of a path in F from v to s leaf. The depth and height will 
he written simply iiepfch(v) and (v) when the forest F is 

understood, Tho height of a tree A. h&ightik '), is the height of 
its root* 

The algorithm* are built from three kind* of instructions 
which operate on a forest F, If v 1* a node, then find (v) does 
the following e 

1+ If v is a root, or if father(v) is a root, then F is left 
unchanged. 

2, Otherwise, let v=v.., Yj.be the [unique) p&ch frem v 

to the toot Then F is eodified by making the 

father of each of the nodes v q s ■'' ,v i£.- 2 ' 

The cost of ftndfv) is 1 + depth(v) . 

The instruction U-mesfge (u,v) has unit coat and ie defined 
only when u and v are both roots. It causes the node u to become 
a direct descendant of v (and hence u ia no longer a root). 

For any node v, let t (v) be the number of nodes ie the 



subtree rooted by v (and Including v itself). The instruction 

{u, v) also has unit coat and is defined only when u and v 
are both toots. If weight(u) < weight(v), it behaves exactly like 
U-merge(4*v) i otherwise, it causes the. node V to become a direct 
descendant of u, 

fle define a U-pTogmsa to be any sequence of instructions con¬ 
sisting solely of finds and U-merges. Similarly, a k-progpepn is 
any sequence of finds and W-merges. 

Let 4 be a U- (U-)program. Thee T(n) ig the. total cost of 
executing the instructions of a in sequence;, gt artitlg. from eft ini¬ 
tial forest F q in which every node is a root. T (u) is undefined 

if any of the instructions in •» is undefined, 


3. A LOWER BOUND FOR THE COST OF THE UNWEIGHTED ALGORITHM 

In this section, we show how to find, for each n > 0, a 
U-pr-ogr am & Of length n such that T(a) > cn(log n) for seme con¬ 
stant c independent of n. 

We begin by defining inductively for each n a class S of 
trees: n 

(i) Any tree consisting of Just a single node is an 5^ tree. 

(11) Let A and a be S trees, and assume chut A and & have 

n—i 

no nodes in common. Then the tree obtained by 

attaching the root of A to the root of B is an $ tree. 

fL 

Figure 3.1 illustrates the building of an S tree, and Figure 3,2 
shove an 5, tree. 

S.l. Let A be an tree. Then A has 2* nodes, 
huight-(A) — n, and A contains, a unique node of depth n. 

Proof. Trivial induction on n. □ 



Figure 3.1. Definition of an S tree. 




In light of the lemma + ve define the htmdte of an S lfee to 
be the unique node of depth b+ 

Two alternate characterizations of $ trees ore iLLustrated 
In Figure 1.3 and stated in; 

ijfitiSSfE 3. £. Let A be an 3 tree with handle v t 

n 

[a) Thera exist disjoint trees A„,. t A not containing v 

U n-i 

with roots respectively such that (1) is an S^. 

tree, 0 < i < n-I, and (2) A is the result of attaching v to a^ 
and a^ to for each i T 0 < i c n-1, 

(b) Thtra exist disjoint trees A^,...with roots 

respectively and a node u not in any Aj such that 

{JL) Aj i$ an 5^ tree, 0 £ i * n-l» and (2) A is the result of 

attaching a^ to u for eath i, 0 < i < n-1. Woreover* v is the 

handle of A 1 ,. 

n-1 

Proof. Again the proof is a trivial induction on n and is 




Figure 3.3. Decompositions of an tree A- 










The remarkable property o£ an 5^ tree is that it is self- 

rC pt Lidutlnft in the sense that if an tree A is embedded in a 

larger tree E an that the root of A has depth >0 in E, then a 
fill'd on the handle of A (which collapse e the path ah eve the handle) 
toacs at least n+2 and the resulting tree still has an 5 tret 
embedded in it3 n 

Ve new make these notions more precise. 

Definition, Let A and b be trees. A one-one function n- 
V(A) ■* V(0) is an ez&edding of A in 0 if for all u*v c Y(A), 
u ■« father (v) Iff r| Cu> - father (iri(v)), in is initial {proper) if n 
■maps (does not map) the root of A onto the root of 0. tfe say that 
A is initially {properly} in 0 if there exists an ini¬ 

tial (proper) erobedding of A In B. 

Le !7sm I. 3. Let A be an S tree with handle v, and assume n 

n 

is a proper embedding of A In a tree P. Then A 1 is initially 

embeddable in the tree F 1 , where A 1 is an 3 tree and P 1 results 

n 

from the instruction find(hCv)) on F. 

Proof* The trees described below are illustrated in Figure 

3,4. 


Let a be an 5 tree with handle v, and assuse n is a proper 

Tl 

embedding Of A in p. By Leaoa 3,2(a), we may assume that v,a Q , 

. , . ,a , is the path from v to the root of A, and aa. , are 
n-“ 1 0 ji- 1 

the roots of disjoint subtrees A^,...,A^_^ respectively, where 
eath A^ is an. tree, Q<i<n-1. 

For each i, 0<t<n-l, let F^. be the subtree of F consisting of 
the nodes in {n(u) [ u e V(A^)}. 


of 


Let A 1 be the 


the nodes 


a_. to 


tree formed as in Lemc.a 1.2(b) by linking each 

a new node a 1 . Then A 1 is an S tree. 

n 


Let P 1 result 
on F, and let 0 be 


from the execution of the instruction find (t| (tf)) 
the root of P 1 , 


Finally, define a mapping n’ from the nodes of A 1 to the 
piodeg of P . n’fu) s ^"n(u) if u e V(A^) for some i, O^i^n-li 


0 if u = a’. 



p 



o' p = 17 ( a' 3 



Figure Trees in the praef *f Lenina 3.3. 




















It remains co show that: H r is dn initial embedding of A’ in 

E 1 

Let tt be the path from n(v) to the root of P + from the defi¬ 
nition of embedding, each of the nodes n(v>, n(a,J ,*** , nfa ') 

d n-i 

appears on tt, and no node in Y ± CKCfcpt for is in fi, G<I<n-l, 

Aa a consequence of the find, each of the nodes n(d^) is 

linked directly to the root p of P f , and since the path t) did not 
run through any node a of except for die root, Pj is S subtree 

Of P 1 linked directly to p. It is easily verified that i; f is an 
initial e«badding pf A 1 in P 1 . Q 

We new construct a costly U-progxam. First build an 5 tree. 

K 

Then alternately M pUSh M It down by merging it to a new node, and 
perfora a find on the handle* This find costs k+2 units and it 
leaves ua with a new tree in which an S. tree Is initially embed¬ 
ded. Thus we can repeat the "merge, find” sequence as often as 
we wish, yielding an average instruction time that approaches 
£fc+-3)/2. Since we can do this for arbittaty k, the cost of 
U-prograns cannot be linear in their length. In fact, we show: 

Ifteonam 3 + For any n^0 T there exists a b-ptogram a of length 
n such that T(aJ > cn(log n) for aoone constant c Independent of n„ 

Pxoof. Let a ,a ,... ha a sequence of distinct nodes* and 
let & he a program of 2 -1 U-merges which build a an 5^ tree, out of 
the nodes a^...*a For each i > 1, let be the handle and r^ 

the root of the tree chat results from the sequence S ,Y^*< ' ■ 

and define y, - "U-meifge^ , a ), find(v,) H , Let 4 be the 

2+i, 

sequence e,Y.. y . where m - 2-1, The* T(a) ** (2-1) + m(k+3) , 

± aTi 

and the length of a ia n = 3~i, so 

T(a> - | + > cn(log n) (3.1) 

for sores constant c. 

For n not of the form 3(2-1) , we form the next shorter 
sequence that is of that form and then extend it arbitrarily to 
get a sequence of length exactly n. This will have the effect 
only of changing the constant in 0 * 1 }. □ 



4+ UPPER BOUNDS 


We get uproar bounds eft the two algorithm* by considering a 
slight generalization of a find ins true tion + Find(u „ v) behaves 
Ilk* e find(u) where we pretend th*t V la the root* More pre¬ 
cisely, j^fld(u»v) ia defined only if v is an ancestor of u. If 
that is the case* let u*Uq*u iP + * , .u^v be the path from \i to v, 

Then find(u*v) causes each of the nodes UQ.wt.u^ ^ to be attached 

directly to V + Its coat la defined to be fc+l + A sequence of 
generalised find and (W-Jraerge instructions is called a 
generalised U- (V-)program* 

natation* Let F be a forest and a a pragma, Then ?:<* is 
the forest that results from P by executing the instructions in a. 

L^rana 4*2* Let u he any node in a forest F* Then there 
exists a node v in P such that F:flnd(u) “ Fi£ind(ti t v) and the 
coats of executing, f±nd(u) and find{u,v> ere the name. 

Proof* Choose v to be the root of the tree containing u. □ 

Applying Lemma 4,1 in turn to each of the find instructions 
in 4 U- or W-program <x gives the following l 

L?.77mj 4*2* Let a be a U- (W-) prog ram and F a forest. Then 
there exists a generalised U- (W-)program & such that F:a = F:g 
and T(n) - T(B), 

Generalized: programs are convenient to deal with because there 
is no lose of generality in restricting attention to programs in 
vhich all the merges precede all the finds. 

i'S.TTUi 4*5* Let F be a forest containing the nodes p, q, u 
and v and let M be the instruction U-nerge(p t q) (W-zerge(p, q )). 

Let n^ = ''find(u h v) t tf* and = '"M, find (u, v) 1 ', If is defined 

on F h then F:a^ = Feq^ and T(a^) = T(o^) ■ 

Proof. The only possible effects of H are to change the 
father of p to be q, or to change the father of q to ho p. 

Similarly p the only possible effects of the instruction find(u»v) 
are to change the fathers of the nodes on the path from u to V 
{but not including the last two such nodes) . Since is defined, 

then v la an ancestor of u and both p and q ata roots in F; hence 
the sets of father links changed by the two in at ructions are dis¬ 
joint, Moreover,. the choice of whether to link p to q or q to p 
In case K is a W-merge instruction depends only on the weights of 
p and q, and the weight of a root ia not affected by a find 



Instruction. Hence, neither Instruction affects the action of the 
other, so Ffa^ = F:«^ and “ T(a^> . [j 

LetoUS 4.3 enables one to convert a generalized progress into 
an equivalent one in which all the merges precede all the finds. 

L^rmas 4.4. Let a be a generalized profirani, and let B result 
from a by moving all the merge instructions left in the sequence 
Defera all the finds* but preserving Che order of the merges and 
the order of the finds. Then F:a = F:B and T(a) » T(B). 

To bound the cast of a generalized U-program* we consider the 
effects of a b-merge and a generalized find instruction on the 
total path length of a forest F , defined to be 

l depth(v). 
veV(F) 

Let a be a sequence of n U-merge instructions and 
let F “ F^in, fhen the total path length of F < El 3 ♦ 

Proof. No node in F can have depth > n* and at most n nodes 
have non-zero depth. Hence* the total path Length < r. 2 » O 

4.6, A generalized find instruction of cost A > l 
reduces the total path length by at least (£-2) a /2. 

Proof. Let find(U.V) be an instruction of cost i-. Then there 
is A path U"U 0 ,u L ,... ,u »v from u to v. For each £, 0 c i < £-3, 

the find causes the depth of node u^ to betore one plus the depth 

of v, so the reduction in total path length is at least 

1-3 a-3 1-2 { ,_ 2 ,2 

I (dspthtu ) - Cl-K3opth[w>>) » £ (g-2-i) = | J > ^-r^— 

i-0 1-0 j=l 

Fkeop&m 3. Let a be a U-program of length n. Then 

1/2 

T(n) < cn for some conetent t independent of n. 

Proof. By Lemma 4.2* it suffices to bound a generalized 
U-program a instead, and by Lemma 4.4* we may assume that all the 
IT-mergas in a precede all the finds. 

A program of length n clearly has at most n merge instructions 
and at most h find instructions. Let 1 be the cost of the i c h 

find instruction if there is one and 0 if not. Clearly, 



(4,1) 


T(a) < n + £ A * 

1=1 1 

Bj Lemma 4,5, the fore*t after executing the merge instruc¬ 
tions in a nan have a total path Length of at most n 2 . Only Che 
find instructions e£ coat greater than two affect the tree* so let 
I " i. i | ij>2} h l£ id* Lenaa 4.6 assert a that the ±th find 

instruction decreases the to cal path length by at least (Jt ± -2) 2 /2. 

The total path Length at the end cannot be negative, so 

L - - ’ 5 


11 I (* ± - 2 > > k [ (Jl,-Z) 2 - 2n 

iel * i-L 


(4.2) 


or 


6n' 


> l < v 2>2 - 

i-i 


(4,5) 


II 

The maximum value for J 4 is achieved when all the 


l-l 


1^'s ate equal, for if they are not all the same* replacing each 
by the man I can only cauae V (Jt ± -2) 2 to decrease. Hence, from 


i-1 

(4.1) and (4.5) we get 
T(a) < n +■ n4 

where 4 is subject to the constraint that 
&n Z > n(Jt-2) 2 . 

Ftoa (4.5), 

i * 2 +■ 

4tid substituting into (4,4), we get 

T(a) < n + n(2 + /§n) < $n^ 2 , □ 


(4.4) 


(4,5) 


(4.6) 


(4.7) 


F-ot the case of the weighted algorithm* we prove an upper 
bound Of 0(n log log n) using a cethod similar to our proof of 
Theorem. 2. 

We say that a forest F is but liable if it can be obtained 
from by A sequence of W-metge instructions. Build-able forests 

have th* Important property that most nodes have low height. 


I-ewna 4*? (Hopcroft and Ullman (1971A)) * L«t F be a build- , 
able forest. IF v is a node in F of height h, then weight (v) '■ 2 . 

Proof. The result follows readily by induction otl h + We 
Leave the details to the reader, Q 

Co^ottaj^. Let a he a sequence of W-merge instructions of 
Length n and Let F = For any h > 0, F contains at most 

n/2^ 1 non-roots of height h. 


Proof, F has exactly n non-roots, for each W-merga changes 
one root to a non-root. Suppose are non-roots of height 

h, Ey the Lemma, weight(u^) > 2^ h and ail the nodes counted in 

the weight of are non-roots, l^i^k. Hence, 


n ? £ weight(u ) > k-2 

i=l 


(4-S) 


so 


k < n/2 1 . 


Instead o£ looking at total path length, we consider a quan¬ 
tity Q(F,(0 which depends on two forests F and Q, Our interest is 
in the: case where: F is a boildable forest and. 0 results from f by 
a sequence of generalized finds, although our definition applies 
whenever V(F) =■ V{G): 


q(F,G) = l depth[G](v)'2 
veV(F> 


height[F](v) 


Q 


(4.9) 


£e!7m*a 4+0. Let <1 be a sequence of W-merge in struct ions of 
length n j 1 and let F " F^:**. Then Q(F,F) «= n{log (n+1) ) " . 

Pr-O&ft Mo tree in F can have more than n+1 nodes. By Lenma 
4.7, a root tan have height at moat log (n+1), bo no node has 
height Ot depth greater than log£n+l). 

Lot M *■ ( ycV(F) | depth[F](v) > 0 ) be the set of non-roots 
of F. Ft os (4.9), wo got 

Q(F,F) < log (n+1)' l 2 hyi ^ ttF] ' . (4.10) 

vtH 

We now wish to bound R(F) * £ jhelfihtlF](v) ^ & jrooc 

utN 

lias height at most log (n+1), any node veM has height at most 
H * log(n+1) - 1, so sunning ever the heights of nodes, 


(4-11) 


a(F) - l (?r l nodes tn S of height h>'2 h 
h-0 

By the corollary to berate 4*7the number of nodes in. N of height 

h is at mast n/l\ so 

|HJ h 

ItCF) < l C-^)'2 n < (H+l)n = n-log(n+i). (4.12} 

“ h*Q 2 

Substituting (4,12) into (4.10) gives the desired result* Q 


Lsnsxi 4.9. Let F he a huildable forest , (f a sequence of 
generalized finds, and let G = F 1 4 - If u is a descendant of v in 
G and u/v, then height[P](u) < heighcfF] (v). 


'Proof. It is easy to show by induction on the length of $ 
that if u is a descendant of v in G, then u is also a descendant 
of v in F, By the definition of height, it follows that 
height [TKu) < height[P](v>+ □ 


Ir7i7i7rc 4.10. Let F he a buildable forest, a sequence of 
generalized finds, and let G = F;<f* Assume find(u,v) is defined 
on G, has cost Jl > 2, and results in a forest G'. Then 

Q(F.G) - Q(F ± G f ) > 2* _3 + 


Proof. Let u a u ^ n v be the path from u to v In G. By 


Lemma 4,9, the heights in F of the nodes in the path are monotone 
increasing, and since heights are integral, height^F](u ? * i'3. 


The instruction find(u,v) does not increase the depth of any node 

and it decreases the depth of u. , by one, so 

K-S 


Q(F h G) - Q(F,G P ) > ^height [F] (u j) * 2*~ 3 . □ 


Fkeorew 3. Let a be a W-program of length n>4. Then 
T(n> < cn(log log n> for some constant e independent of n, 


Proof. By Leranas 4.2 and 4.4, it Suffice# to prove the 
theorem for a generalized W-prograra a “■ of length n, where p is 
a sequence of U-mcrgc instructions and ■£ is a sequence of general¬ 
ized find inscruetions, 


The length# of p end are clearly both at moat n. Let be 

the cost of the find instruction if there is tut ind 0 if hot. 

Then n 


T l» < n + £ f, , 


<4-13) 



(4*14) 


Now* let F = . 3y Lemma 4.S* 

q(F*F) i n(loe(n+l)) i . 

Only find instructions of cost greater than two affect the 
forest* so let I = fi | and let 0 •* Fi^* repeated ua-e of 

Lemma 4.10, 

q(F T F) - q(F,G> > l 2 t£ l‘ 3) > ( I 2^i' 3> ) - n. (4.15) 
iel 1-1 

Since q{F,G> * 0, we conclude from (4*14) and (4.15) that 


sc 


ttClogCn+l)} 2 * I 2 ( V 3> - n 
1-1 

2n(log(n+l)) 2 > l a Ci l“ 3 *. 


(4.16) 

(4.1?) 


i-i 
n 


The maKimum value for T & T is achieved when ell the £. * are 

, , i 1 

i=l 

equal, for if they are not ell the *»e, replacing each by the 

mean f cftn Only cause £ 2 k i ^ to decrease. Hence, from (4.13) 

i-1 

and (4*17), ve get 


T(a) < n + nl 


(4.18) 


where Jt is subject to the constraint that 

2n(li>g(n+l)) Z > a*2^ 3 ^ , (4.19) 

Takitig logarithms (to the base 2} ? we get 

£ < 3 + lag 2 + 2 (log log(n-+l)) < b (log l&g(n+l))*(4,20) 
Substituting, back into (4.13) yields 

T(a) e n + bn(log logCn+l)) < 13n(log log n)- (4.21) 


5, CONCLUSION 

We have conaidered two heuristics, the collapsing rule and 
the weighting rule, which purportedly improve the basic tree-baaed 
equivalence algorithm* Our results, together with the remarks in 


the introduction, shew that each heuristic does indeed Improve the 
worst case behavior of the algorithm, and together they are better 
than either alone. 

There is still a considerable- gfcp between the lower end upper 
bounds we have been able to prove for the two algorithms employing 
the collapsing rule, and w* are u-nab Le to show even that the 
weighted algorithm requires mote then linear time. We Leave as an 
open problem to construct any equivalence algorithm at alt which 
CSh be proved to operate in linear time. 
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